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ABSTRACT 

Context. Since its formulation by Hamaker et al., the radio interferometer measurement equation (RIME) has provided a rigorous 
mathematical basis for the development of novel calibration methods and techniques, including various approaches to the problem of 
direction-dependent effects (DDEs). However, acceptance of the RIME in the radio astronomical community at large has been slow, 
which is partially due to the limited availability of software to exploit its power, and the sparsity of practical results. This needs to 
change urgently. 

Aims. This series of papers aims to place recent developments in the treatment of DDEs into one RIME-based mathematical frame- 
work, and to demonstrate the ease with which the various effects can be described and understood. It also aims to show the benefits 
of a RIME-based approach to calibration. 

Methods. Paper I re-derives the RIME from first principles, extends the formalism to the full-sky case, and incorporates DDEs. Paper 
II then uses the formalism to describe self-calibration, both with a full RIME, and with the approximate equations of older software 
packages, and shows how this is affected by DDEs. It also gives an overview of real-life DDEs and proposed methods of dealing with 
them. Finally, in Paper III some of these methods are exercised to achieve an extremely high-dynamic range calibration of WSRT 
observations of 3C 147 at 21 cm, with full treatment of DDEs. 

Results. The RIME formalism is extended to the full-sky case (Paper I), and is shown to be an elegant way of describing calibration 
and DDEs (Paper II). Applying this to WSRT data (Paper III) results in a noise-limited image of the field around 3C 147 with a very 
high dynamic range (1.6 million), and none of the off-axis artifacts that plague regular selfcal. The resulting differential gain solutions 
contain significant information on DDEs and errors in the sky model. 

Conclusions. The RIME is a powerful formalism for describing radio interferometry, and underpins the development of novel cali- 
bration methods, in particular those dealing with DDEs. One of these is the differential gains approach used for the 3C 147 reduction. 
Differential gains can eliminate DDE-related artifacts, and provide information for iterative improvements of sky models. Perhaps 
most importantly, sources as faint as 2 mjy have been shown to yield meaningful differential gain solutions, and thus can be used as 
potential calibration beacons in other DDE-related schemes. 

Key words. Methods: numerical - Methods: analytical - Methods: data analysis - Techniques: interferometric - Techniques: polari- 
metric 
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w ! Introduction to the series sitivity, but also to new features of their design. In particular, 

while traditional selfcal only deals with direction-independent 
Tl ! The Measurement Equation of a generic radio interferome- effects ( DIEs )< calibration of these new instruments requires us 
ter (henceforth referr ed to as the RIME) was formulated by to deal with direction-dependent effects (DDEs), or effects that 
iHamaker et all (fl996h after almost 50 years of radio astronomy, vary across the field of view (FoV) of the instrument. Following 
Prior to the RIME, mathematical models of radio interferome- I Noordam & Smirnov | d2010|), I shall refer to generations of call- 
ed ' ters (as implemented by a number of software packages such as bratlon methods, with first-generation calibration ( 1 GC) predat- 
AIPS, Miriad, NEWSTAR, DIFMAP) were somewhat ad hoc ln § selfca1 ' 2GC bein 8 traditional selfcal as implemented by the 
and approximate. Despite this (and in part thanks to the careful aforementioned packages, and 3GC corresponding to the bur- 
design of existing instru ments), the technique of self-calibration geoning field of DDE-related methods and algorithms. 
dCornwell & Wilkinsonll 19811) has allowed radio astronomers to It is indeed quite fortunate that the emergence of the RIME 
achieve spectacular results. However, by the time the RIME was formalism has provided us with a complete and elegant math- 
formulated, even older and well-understood instruments such ematical framework for dealing with observational effects, and 
as the Westerbork Synthesis Radio Telescope (WSRT) and the ultimately DDEs. Oddly enough, outside of a small community 
Very Large Array (VLA) were beginning to expose the limita- of algorithm developers that have enthusiastically accepted the 
tions of these approximate models. New instruments (and up- formalism and put it to good use, uptake of RIME by radio as- 
grades of older obs ervatories), suc h as the current crop of Square tronomers at large has been slow. Even more worryingly, almost 
Kilometer Array (Schilizzi 2004) "pathfinders", and indeed the 15 years after the first publication, the formalism is hardly ever 
SKA itself, were already beginning to loom on the horizon, taught to the new generation of students. This is worrying, be- 
These new instruments exhibit far more subtle and elaborate ob- cause in my estimation, the RIME should be the cornerstone of 
servational effects, due not only to their greatly increased sen- every entry-level interferometry course! In part, this slow accep- 
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tance has been shaped by the availability of software. Today's 
radio astronomers rely almost exclusively on the 2GC software 
packages mentioned above, whose internal paradigms are rooted 
in the selfcal developments of the 1980s and lack an explicit 
On the other hand, relatively few observations were re- 
ally sensitive enough to push the limits of (or have their sci- 
ence goals compromised by) 2GC. The continued success of 
legacy packages has meant that the thinking about interferom- 
etry and calibration has still been largely shaped by pre-RIME 
paradigms. What has not helped this situation is that new soft- 
ware exploiting the power of the RIME has been slow t o emerge, 
and p ractical results even more so - but see Paper III (ISmirnovl 
1201 lbl) of this series. 

On the other hand, from my personal experience of teaching 
the RIME at several workshops, once the penny drops, people 
tend to describe it in terms such as "obvious", "simple", "intu- 
itive", "elegant" and "powerful". This points at an explanatory 
gap in the literature. Paper I of this series therefore tries to ad- 
dress this gap, recasting existing ideas into one consistent math- 
ematical framework, and showing where other approaches to the 
RIM E fit in. It first revisits the ideas of th e original RIME pa- 
pers dHamakeret al.lll996t lHamakerll2000l) , deriving the RIME 
from first principles. It then demonstrates how the fundamen- 
tals of interferometry itself (and the van Cittert-Zernike theorem 
in particular) follow from the RIME (rather than the other way 
around!), in the process showing how the formalism can incor- 
porate DDEs. This section also looks at alternative formulations 
of the RIME and their practical implications, and shows where 
they fit into the formalism. It also tries to clear up some contro- 
versies and mis understandings th at have accumulated over the 
years. Paper II dSmirnovll201 lab then discusses calibration in 
RIME terms, and explicates the links between the RIME and 
2GC implementations of selfcal. 

Paper II also discusses the subject of DDEs, and places ex- 
isting approaches into the mathematical framework developed 
in the preceding sections. DDEs were outside the scope of the 
original RIME publications, but various authors have bee n in- 
corporating them into the RIME since. iRau et all d2009l) and 
Bhatnagarl d2009l) provide an in-depth review of these devel- 
opments, especially as pertaining to imaging and deconvolu- 
tion. The above authors have developed a description of DDEs 
using the 4x4 Mueller matrix a nd coherency v e ctor fo rmal- 
ism of the first RIME paper by lHamaker et all dl996l) . The 
4 x 4 formalism has also been included i n the 2nd edition 
of iThompson. Moran. & Swenson. Jr.l d2001l Sect. 4.8). In the 
meantime. lHamakerld2000l) has recast the RIME using only 2x2 
matrices. The 2x2 form of the RIME has far more intuitive 
appealJ3 and is far better suited for describing calibration prob- 
lems, yet has been somewhat unjustly ignored in the literature. 
Addressing this perceived injustice is yet another aim of these 
papers. (Section [6] describes the 4 x 4 vs. 2 x 2 formalisms in 
more detail.) 

Last but certainly not least, Paper III dSmirno vl2011bl) shows 
an application of these concepts to real data. It presents a record 
dynamic range (over 1 .6 million) calibration of a WSRT obser- 
vation, including calibration of DDEs. It then analyzes the re- 
sults of this calibration, shows how the calibration solutions can 



1 All 2GC packages do use some specific and limited for m of the 
RIME implicitly. This will be discussed further in Paper II dSmirnovl 
1201 lah . 

2 This (admittedly subjective) judgment is firmly based on personal 
experience of teaching the RIME. 



be used to improve sky models, and demonstrates a rather im- 
portant implication for the calibratability of future telescopes. 

1 . The RIME of a single source 

Like many crucial insights, the RIME seems perfectly obvious 
and simple in hindsight. In fact, it can be almost trivially de- 
riv ed from basic conside rations of signal propagation, as shown 
bv lHamaker eTail (0996). In this paper, I will essentially repeat 
and elaborate on this derivation. This is not original work, but 
there are several good reasons for reiterating the full argument, 
as opposed to simply referring back to the original RIME pa- 
pers. Firstly, some aspects of the basic RIME noted here are not 
covered by the original papers at all. These are the commuta- 
tion considerations of Sect. II. 61 the fact that Jones matrices and 
coherency matrices behave differently under coordinate trans- 
forms (for which reason I even propose a different typographical 
convention for them), as discussed in Sect. 16.31 and the 1/2-vs.- 
1 controversy of Sect. 17.21 Then ther e's the fac t that t he 2x2 
version of the formalism proposed bv lHamakerl d2000h and and 
employed here provides for a much clearer and more intuitive 
picture that the original 4x4 derivation (see Sect. l6.1l for a dis- 
cussion), and so deserves far more exposure in the literature than 
the sole Hamaker paper to date. Finally, I want to establish some 
typographical conventions and mathematical nomenclature, and 
lay the groundwork for my own extensions of the formalism, 
which start at Sect. [3] This seemed sufficient reason to give a 
complete derivation of the RIME from scratch. 

In Sects. |2]and[3] I extend the 2x2 formalism into the image- 
plane domain, show how the van Cittert-Zernike (VCZ) theo- 
rem naturally follows from the RIME, and sketch the problem 
of DDEs. Section [4] elaborates some RIME-based closure rela- 
tionships, Sect.|5]then examines some important limitations and 
boundaries of the RIME formalism, and Sect.|6]looks at alterna- 
tive formulations of the RIME. Finally, Sect. [7] attempts to clear 
up some errors and controversies surrounding the formalism. 

1.1. Signal propagation 

Consider a single source of quasi-monochromatic signal (i.e. a 
sky consisting of a single point source). The signal at a fixed 
point in space and time can be then be described by the complex 
vector e. Let us pick an orthonormal xyz coordinate system, with 
z along the direction of propagation (i.e. from antenna to source). 
In such a system, e can be represented by a column vector of 2 
complex numbers: 



Our fundamental assumption is linearity: all transformations 
along the signal path are linear w.r.t. e. Basic linear algebra tells 
us that all linear transformations of a 2-vector can be represented 
(in any given coordinate system) by a matrix multiplication: 

e' = Je, 

where J is a 2 X 2 complex matrix known as the Jones ma- 
trix dJonesl[l94lb . Obviously, multiple effects along the signal 
propagation path correspond to repeated matrix multiplications, 
forming what I call a Jones chain. We can regard multiple effects 
separately and write out Jones chains, or we can collapse them 
all into a single cumulative Jones matrix as convenient: 

e' = JnJn-i Jie = Je (1) 



O.M. Smirnov: Revisiting the RIME. I. A full-sky Jones formalism 



3 



The order of terms in a Jones chain corresponds to the phys- 
ical order in which the effects occur along the signal path. Since 
matrix multiplication does not (in general) commute, we must 
be careful to preserve this order in our equations. 

Now, the signal hits our antenna and is ultimately converted 
into complex voltages by the antenna feeds. Let us further as- 
sume that we have two feeds a and b (for example, two linear 
dipoles, or left/right circular feeds), and that the voltages v a and 
Vh are linear w.r.t. e. We can formally treat the two voltages as a 
voltage vector v, analogous to e. Their linear relationship is yet 
another matrix multiplication: 



Assuming that and J q are constant over the averaging 
interval^ we can move them outside the averaging operator: 



V w = 2J,,{ee H )J H 



, , (e x e* x ) (e x e* y ) \ H 
p < (e y e* x ) (e y e* y ) ) J * 



(6) 



The bracketed quantities here are i ntimately related to 
the definition of the S tokes parameters (iBorn & Wolf) 1 1964; 
[Thompson et ail 1200 ll) . lHamaker & Bregmanl (1 1996k explicitly 
show that 



v a 

Vb 



= Je 



(2) 2 



{e x e x ) <£ 
(e y e* x ) (e 



i + Q 

U-iV 



U + iV 

i-Q 



= B 



(7) 



Equation (0 can be thought of as representing the fundamen- 
tal linear relationship between the voltage vector v as measured 
by the antenna feeds, and the "original" signal vector e at some 
arbitrarily distant point, with J being the cumulative product of 
all propagation effects along the signal path (including electronic 
effects in the antenna/feed itself). I shall call refer to this J as the V ; , 9 = J p BJ q 
total Jones matrix, as distinct from the individual Jones terms in 
a Jones chain. 



I now define the brightness matrix B as the right-hand sid^f] 
of Eq. (0. This gives us the first form of the RIME, that of a 
single point source: 



Or in expanded form: 



(8) 



1.2. The visibility matrix 

Two spatially separated antennas p and q measure two inde- 
pendent voltage vectors v p ,v q . In an interferometer, these are 
fed into a correlator, which produces 4 pairwise correlations be- 
tween the components of v p and v q : 



(VpaV'o), (Vp a V* b ), (v pb V qa ), (v pb V qb ) 



(3) 



Here, angle brackets denote averaging over some (small) 
time and frequency bin, and x* is the complex conjugate of x. 
It is convenient for our purposes to arrange these four correla- 
tions into the visibility matri^V pq : 



V 



J (VpaV* qa ) {VpaV* qb ) 
[(VpbVla) (VpbV* qb ) 



I introduce a factor of 2 here, for reasons explained in 



Sect. I7.2I It is easily seen that V pq can be written as a matrix 
product of v p (as a column vector), and the conjugate of v q (as a 
row vector): 



V« = 2 



(4) 



Here, H represents the conjugate transpose operation (also 
called a Hermitian transpose). 

1.3. The RIME emerges 

Starting with some arbitrarily distant vector e, our signal travels 
along two different paths to antennas p and q. Following Eq. (0, 
each propagation path has its own total Jones matrix, J p and J q . 
Combining Eqs. (0 and |@}, we get: 



y pq = 2(J p e(J q e) H ) = 2(J p (ee H )J^) 



(5) 



3 Hamaker ( 2000) calls V ' pq the coherency matrix, in order to distin- 
guish it from traditional scalar visibilities. Since the elements of the 
matrix are precisely the complex visibilities, I submit visibility matrix 
as a more logical term. 



Vaa V a b 
Vba Vbb 



jllp jl2p\( I + Q U + iV\(juq jl2q\ H 

h\ P hip }\U-iV I-Q J\ h\ q hi q J 



which quite elegantly ties together the observed visibilities 
V ' pq with the intrinsic source brightness B, and the per-antenna 
terms J p and J q . 

Note that Eq. © holds in any coordinate system. The vector 
e, the brightness matrix B that is derived from it, and the lin- 
ear transformations J p and J q are distinct mathematical entities 
that are independent of coordinate systems; choosing a coordi- 
nate basis associates a specific representation with e, B and J, 
manifesting itself in a 2-vector or a 2 X 2 matrix populated with 
specific complex numbers. For example, it is quite possible (and 
sometimes desirable) to rewrite the RIME in a circular polariza- 
tion basis. This is discussed further in Sect. I6.3I In this paper, I 
shall use an orthonormal xyz basis unless otherwise stated. 



1.4. Some typographical conventions 

Throughout this series of papers, I shall adopt the following ty- 
pographical conventions for formulas: 

Scalar quantities will be indicated by lower- and uppercase ital- 
ics: e x , I, K p . 

Vectors will be indicated by lowercase bold italics: e. 

Jones matrices will be indicated by uppercase bold italics: J . As 
a special case, scalar matrices (Sect. ll.6l l will be indicated by 
normal-weight italics: K p . 

Visibility, coherency and brightness matrices will be indicated 
by sans-serif font: B,\f pq ,X pq . This emphasizes their dif- 
ferent mathematical nature (and in particular, that they 
transform differently under change of coordinate frame, 
Sect.Ol. 



4 This is a crucial assumption, which I will revisit in Sect. 15.21 

5 Following a long-standing controversy, I have decided to break with 
Hamaker (2000) by omitting | from the definition of B, and adding a 
factor 2 to the definition of V M in Eq. (|4j. The reasons for this will be 



spelled out in Sect. 17. 2 
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1.5. The "onion" form 



3. Rotation matrices commute among themselvefl 



We can also choose to expand J p and J q into their associated 
Jones chains, as per Eq. (Q}. This results in the rather pleasing 
"onion" form of the RIME: 



Vpq ~ J pn(-U plU p\^>j"\)j"2)-)Jq m 



(9) 



Intuitively, this corresponds to various effects in the signal 
path applying sequential layers of "corruptions" to the origi- 
nal source brightness B. Note that the two signal paths can in 
principle be entirely dissimilar, making the "onion" asymmet- 
ric (hence the use of n + m for the outer indices). An example 
of this is VLBI with ad hoc arrays composed of different types 
of telescopes. One of the strengths of the RIME is its ability 
to describe heterogeneous interferometer arrays with dissimilar 
signal propagation paths. 

1.6. An elementary Jones taxonomy 

Different propagation effects are described by different kinds of 
Jones matrices. The simplest kind of matrix is a scalar matrix, 
corresponding to a transformation that affects both components 
of the e vector equally. I shall use normal-weight italics (K) to 
emphasize scalar matrices. An example is the phase delay matrix 
below: 



K = e* = 



e"^ 
e'* 



1 
1 



An important property of scalar matrices is that they have 
the same representation in all coordinate systems, so scalarity is 
defined independently of coordinate frame. 

Diagonal matrices correspond to effects that affect the two 
e components independently, without intermixing. Note that un- 
like scalarness, diagonality does depend on choice of coordinate 
systems. For example, if we consider linear dipoles, their elec- 
tronic gains are (nominally) independent, and the corresponding 
Jones matrix is diagonal in an xy coordinate basis: 



G = 



g* 
j? v 



The gains of a pair of circular receptors, on the other hand, 
are not diagonal in an xy frame (but are diagonal in a circular 
polarization frame - see Sect. 16.31 ). 

Matrices with non-zero off-diagonal terms intermix the two 
components of e. A special case of this is the rotation matrix: 



Rot0 



cos0 -sin0 
sin</> cos0 



Like diagonality, the property of being a rotation matrix also 
depends on choice of coordinate frame. Examples of rotation 
matrices (in an xy frame) are rotation through parallactic angle 
P, and Faraday rotation in the ionosphere F. Note also that ro- 
tation in an xy frame becomes a special kind of diagonal matrix 
in the circular frame (see Sect. 16.31 ). 

It is important for our purposes that, while in general matrix 
multiplication is non-commutative, specific kinds of matrices do 
commute: 

1. Scalar matrices commute with everything. 

2. Diagonal matrices commute among themselves. 



Rules 2 and 3 are not very satisfactory as stated, because "di- 
agonal" and "rotation" are properties defined in a specific coor- 
dinate frame, while (non-)commutation is defined independently 
of coordinates: two linear operators A and B either commute or 
they don't, so their matrix representations must necessarily com- 
mute (or not) irrespective of what they look like for a particular 
basis. Let us adopt a practical generalization: 

The Commutation Rule: if there exists a coordinate basis in 
which A and B are both diagonal (or both a rotatiorQ), then 
AB = BA in all coordinate frames. 

We shall be making use of commutation properties later on. 



1.7. Phase and coherency 

Equation © is universal in the sense that the J p and J q terms 
represent all effects along the signal path rolled up into one 2x2 
matrix. It is time to examine these in more detail. In the ideal 
case of a completely uncorrupted observation, there is one fun- 
damental effect remaining - that of phase delay associated with 
signal propagation. We are not interested in absolute phase, since 
the averaging operator implicit in a correlation measurement 
such as Eq. (f3]l is only sensitive to phase difference between volt- 
ages v p and v q . 

Phase difference is due to the geometric pathlength differ- 
ence from source to antennas p and q. For reasons discussed in 
Sect. I5.21 we want to minimize this difference for a specific di- 
rection, so a correlator will usually introduce additional delay 
terms to compensate for the pathlength difference in the chosen 
direction, effectively "steering" the interferometer. This direc- 
tion is called the phase centre. The conventional approach is to 
consider phase differences on baseline pq, but for our purposes 
let's pick an arbitrary zero point, and consider the phase differ- 
ence at each antenna p relative to the zero point. 

Let us ado pt the conventional coo rdinate system^ and nota- 
tions (see e.g. iThom pson et al] |2001l) . with the z axis pointing 
towards the phase centre, and consider antenna p located at co- 
ordinates u p = (u p , v p , Wp). The phase difference at point u p rel- 
ative to u — 0, for a signal arriving from direction <x, is given 
by 

Kp — 2nA~ l (u p l + v p m + w p (n — 1)), 



where I, m, n = Vl - P - m 2 are the direction cosines of cr, 
and A is signal wavelength. It is customary to define u in units 
of wavelength, which allows us to omit the /T 1 term. Following 
lNoordaml dl996). I can now introduce a scalar K-Jones matrix 
representing the phase delay effect. After all, phase delay is 



6 Note that this is only true for 2x2 matrices. Higher-order rotations 
do not commute. 

7 As noted above, rotation can become diagonality through change 
of coordinate basis, so this doesn't actually add anything to our general 
rule. 

8 Note that there is some unfor tunat e confusion in coordinate systems 
used in radio interferometry. The IAU ( 1973) defines Stokes parameters 
in a right-handed coordinate system with x and y in the plane of the sky 
towards North and East, and the z axis pointing towards the observer. 
The conventional Im frame has / pointing East and m North. In practice, 
this means that rotation through parallactic angle must be applied in one 
direction in the Im frame, and in the opposite direction in the polariza- 
tion frame. The formulations of the present paper are not affected. 
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just another linear transformation of the signal, and is perfectly 
amenable to the Jones formalism: 



-2ni(upl+Vpm+w p (n—l)) 



(10) 



ply: 



The RIME for a single uncorrupted point source is then sim- 



\l pq - K p BK q 



(11) 



Substituting the exponents for K p from Eq. ( fTOt . and remem- 
bering that scalar matrices commute with everything, we can re- 
cast Eq. (JTTJ in a more traditional form|3 



V„„ = Be 



-2m(K M /+v™m+w M (n-l)) 



linn — "n U n 



(12) 



which expresses the visibility as a function of baseline 
uvw coordinates u pq . I shall call the visibility matrix given by 



pq 

is a measure 



In 



Eqs. (fTTT i or (flZb the source coherency, and write it as X 
the traditional view of radio interferometry, X pq 
ment of the coherency function X(u, v, w) at point u pq , v pq , w pq 
(with X being a 2 x 2 complex matrix rather than the traditional 
scalar complex function). For the purposes of these papers, let us 
adopt an operational definition of source coherency as being the 
visibility that would be measured by a corruption-free interfer- 
ometer. For a point source, the coherency is given by Eq. ( fTTT ). 

1.8. A single corrupted point source 

A real-world interferometer will have some "corrupting" effects 
in the signal path, in addition to the nominal phase delay K p . 
Since the latter is scalar and thus commutes with everything, we 
can move it to the beginning of the Jones chain, and write the 
total Jones J p of Eq. © as 

J p - G p K p , 

where G p represents all the other (corrupting) effects. We can 
then formulate the RIME for a single corrupted point source as: 



V P9 - GpXpqGg, 



(13) 



where X pq is the source coherency, as defined above. 



2. Multiple discrete sources 

Let us now consider a sky composed of N point sources. The 
contributions of each source to the measured visibility matrix 
V pq add up linearly. The signal propagation path is different for 
each source s and antenna p, but each path can be described by 
its own Jones matrix J sp . Equation © then becomes: 



-2> 



R 1 H 

sp a s J sq 



(14) 



Remember that each J sp is a product of a (generally non- 
commuting) Jones chain, corresponding to the physical order of 
effects along the signal path: 



J sp — J 



spn- 



' spl, 



9 The sign of the exponent in these equations is a matter of conven- 
tion, and is therefore subject to perennial confusion. WSRT software 
uses but has used "+" in the past. VLA software seems to use "+". 
Fortunately, in practice it is usually easy to tell which convention is 
being used, and conjugate the visibilities if needed. 



where effects represented by the right side of the chain 
(■•■/.spi) occur "at the source", and effects on the left side of the 
chain (J sp n---) "at the antenna". Somewhere along the chain is 
the phase term K sp , but since (being a scalar matrix) it com- 
mutes with everything, we are free to move it to any position in 
the product. 

Some elements in the chain may be the same for all sources. 
This tends to be true for effects at the antenna end of the signal 
path, such as electronic gain. Let us then collapse the chain into 
a product of three Jones matrices: 

J sp — G p E sp K sp 

G p is the source-independent "antenna" (left) side of the 
Jones chain, i.e. the product of the terms beginning with J spn , 
up to and not including the leftmost source-dependent term (if 
the entire chain is source-dependent, G p is simply unity), E sp 
is the source-dependent remainder of the chain, and K sp is the 
phase term. We can then recast Eq. (TBI as follows: 



Vpq - G p 



X E spK sp BsK" q E^ q 



G» 



(15) 



Or, using the source coherency of Eq. (fTTT) : 



V p« - G p Es p Xs pi E % 



(16) 



G p describes the direction-independent effects (DIEs), or the 
uv- Jones terms, and E sp the direction-dependent effects (DDEs), 
or the sky- Jones terms. 

In principle, the sum in Eq. ( fTol i should be taken over all 
sufficiently brighCB sources in the sky, but in practice our FoV 
is limited by the voltage beam pattern of each antenna, or by the 
horizon, in the case of an all-sky instrument such as the Low 
Frequency Array (LOFAR). In RIME terms, beam gain is just 
another Jones term in the chain, ensuring E sp — > for sources 
outside the beam. 

If the observed field has little to none spatially extended 
emission, this form of the RIME is already powerful enough 
to allow for ca libration of DDEs, as I shall show in Paper III 
dSmirnovl2011bl) . 



3. The full-sky RIME 

In the more general case, the sky is not a sum of discrete sources, 
but rather a continuous brightness distribution B(tr), where <x 
is a (unit) direction vector. For each antenna p, we then have 
a Jones term J p {cr), describing the signal path for direction <x. 
To get the total visibility as measured by an interferometer, we 
must integrate Eq. ([8]) over all possible directions, i.e. over a unit 
sphere: 



V 



4m 



,(o-)B(<r)/f(<r)rfQ 



This spherical integral is not very tractable, so we perform 
a sine projection of the sphere onto the plane (I, m) tangential 
at the field centre^ Note that this analysis is fully analogous to 



10 Brighter than the noise, that is - see Sect. 15. II 

11 Or the pole, for East-West arrays, which does not materially change 



any of the arguments. 
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that of Thomps on et al.l ([200 1 , Sect. 3 . 1 ), with only the integrand 
being somewhat different. The integral then becomes: 

V p? = Jj J p {J)B{J)J^{t)—^, where n= Vl-Z 2 -m 2 . 



I'm going to use / and (l,m) interchangeably from now on. 
By analogy with Eq. (fT31 l. we now decompose J p (t) into a 
direction-independent part G, a direction-dependent part E, and 
the phase term K: 



—2ni(upl+v p m+w p (n—l)) 



J p (l) = G p E p (l)K p (l) = G p E p (T)e 



Substituting this into the integral, and commuting the K 
terms around, we get 



V Im 



-E p BE" e -^(u„ rl l+v„ m+ w„ rl (n-l)) ^ dm 

n q 



G H q (17) 



This equation is one form of a general full-sky RIME. It is 
in fact a type of three-dimensional Fourier transform; the non- 
coplanarity term in the exponent, w pq (n - 1), is what prevents 
us from treating it as the much simpler 2D transform. Since 
Wpq — w p - w q , we can decompose the non-coplanarity term into 
per-antenna terms W p = ie- 2 ";!"-'), These can be thought of 
direction-dependent Jones matrices in their own right, and sub- 
sumed into the overall sky- Jones term by defining E p = E p W p . 
The full-sky RIME (Eq. [T71 > can then be rewritten using a 2D 
Fourier Transform of the apparent sky as seen by baseline pq, or 
B p? : 



•pq 



G, 



V Im 

&pq = EpBEg 



Jj B M e- 2m ' ( ""«' +v "" m) dldm 



G ; 



(18) 



I shall return to this general formulation in Paper II ( S mirnovl 



1201 lah . In the meantime, consider the import of those pq indices 
in B P q. They are telling us that we're measuring a 2D Fourier 
Transform of the sky - but the "sky" is different for every base- 
line! This violates the fundamental premise of traditional self- 
cal, which assumes that we're measuring the FT of one com- 
mon sky. From the above, it follows that this premise only holds 
when all DDEs are identical across all antennas: E p (l) = E(l) 
(or at least where B(/) + 0). Only under this condition does the 
apparent sky B pq become the same on all baselines (in the tradi- 
tional view, this corresponds to the "true" sky attenuated by the 
power beam): 

Bpq(I) = Bapp(/) = E(l)B(l)E H (l) 

If this is met, we can then rewrite the full-sky RIME as: 



V pq - GpXpqG", 



where X 



Pi 



X(u„ 



(19) 

A and the matrix function X(m) is sim- 



ply the (element-by-element) two-dimensional Fourier trans- 
forrrQ of the matrix function B app (Z). I shall also write this 
as X = :FB app . The similarity to Eq. (fT~3b of a single point 



12 Note that I'm using u as a shorthand for both (k, v) and (w,v, w), 
depending on context. 



source is readily apparent. For obvious reasons, I shall call X(m) 
the sky coherency. Effectively, we have derived the van Cittert- 
Zernike theorem (VCZ ), the cornerstone of radio interferometry 
dThompson et alJl200lL Sect. 14. 1), from the basic RIME! 

Such an approac h turns the origin al original coherency ma- 
trix formulation of lHamakeJ (|2000) on its head. Note that 
Eq. ( fT9l here is the same as Eq. (2) of that work. In the RIME 
papers, Hamaker et al. defer to VCZ, treating the coherency as 
a "given" (while recasting it to matrix form) to which Jones ma- 
trices then apply. Tre ating phase (K) as a Jones matrix in its own 
right (Noordam 1996) allows for a natural extension of the Jones 
formalism into the (I, m) plane, and shows that VCZ is actually a 
consequence of the RIME rather than being something extrinsic 
to it. This also allows DDEs to be incorporated into the same for- 
malism, in a manner similar to that suggested for w-projection 
(ICornwell et al.1 1 20081) . I shall return to this subject in Paper II 
dSmirnovl2011al) . 



3. 1 . Time variability and the fundamental assumption of 
selfcal 

I have hitherto ignored the time variable. Signal propagation ef- 
fects, and indeed the sky itself, do vary in time, but the RIME de- 
scribes an effectively instantaneous measurement (ignoring for 
the moment the issue of time averaging, which will be consid- 
ered separately in Sect. I5.2I ). Time begins to play a critical role 
when we consider DDEs. 

At any point in time, an interferometer given by Eq. ST% 
measures the coherency function X(m) at a number of points u pq 
(i.e. for all baselines pq). This "snapshot" measurement gives a 
limited sampling of the uv plane. To sample the uv plane more 
fully, we usually rely on the Earth's rotation, which over several 
hours effectively "swings" every baseline vector u pq through an 
arc in the uv plane. Therefore, for Eq. dT9b to hold throughout 
an observation, we must additionally assume that the apparent 
sky B app remains constant over the observation time! In other 
words, unless we're dealing with snapshot imaging, the E p = E 
assumption must be further augmented: 



E p (t, I) = Ep(l) = E(l) for all t,p. 



(20) 



This equation captures the fundamental assumption of tra- 
ditional selfcal. I shall call DDEs that satisfy Eq. ( f20b trivial 
DDEs. As shown above, trivial DDEs effectively replace the true 
sky B by a single apparent sky B app , and are not usually a prob- 
lem for calibration, since they can be corrected for entirely in the 
image planeQ For example, the primary beam gain i s usually 
treated as a trivial DDE in 2GC packages (see Paper Il. lSmirnovl 
I201 lai Sect. 2.11. 

Equation d20T > is most readily met with narrow FoVs (i.e. 
with E p rapidly going to zero away from the field centre, leaving 
little scope for other variations), small arrays (small w p , also all 
stations see through the same atmosphere), higher frequencies 
(narrow FoV, less ionospheric effects), and also with coplanar 
arrays such as the WSRT (w p = 0, thus W p = 1). The new crop 
of instruments is, of course, trending in the opposite direction 
on all these points, and is thus subject to far more severe and 
non-trivial DDEs. 



13 Even then things are not always easy. Rapid variation in frequency, 
such as the 17 M Hz "ripple" of the WSRT primary beam (see Paper II, 



ISmirnovll201 lal . Sect. 2.1.1) can cause considerable difficulty for spec- 
tral line calibration, even if the DDE is trivial in the sense of Eq. J20t . 
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4. Matrix closures and singularities 

Scalar closure relationships have played an important role in 
2GC calibration, both as a diagnostic tool, and as an observ- 
able. Traditionally, these are expressed in terms of a three- 
way phase closure an d a four-way amplitude closure (see e.g. 
Thompson et a l.l200lL Sect. 10.3). Since the underlying premise 
of a closure relationship is that observed scalar visibilities can 
be expressed in terms of per-antenna scalar gains, and the RIME 
is a generalization of the same premise in matrix terms, it seems 
worthwhile to see if a general matrix (i.e. fully polarimetric) clo- 
sure relationship can be derived. 

Indeed, in the case of a single point source, we can write out 
a four-way closure for antennas m, n,p, q as follows: 



^1 m>N p n V pq^l mq — 1 



(21) 



The above equation can be easily verified by substituting in 
Eq. for eacn visibility term, and remembering that (AB)~ l - 
B l A l . 

Since matrix inversion is involved, the essential requirement 
here is non-singularity of all matrices in Eq. ([8]). The brightness 
matrix B is non-singular by definition (unless it's trivially zero), 
but what does it mean for a Jones matrix to be singular? Some 
examples of singular matrices are: 



a 




a a 




a b 
a b 



and 



b b 



The physical meaning of a singular Jones matrix can be 
grasped by substituting these into Eq. (0. The first two exam- 
ples correspond to an antenna measuring zero voltage on one of 
the receptors (e.g. a broken wire). The latter two are examples of 
redundant measurements: both receptors will measure the same 
voltage, or linearly dependent voltages (consider, e.g., a flat aper- 
ture array, with a source in the plane of the dipoles). In all four 
cases there's irrecoverable loss of polarization information, so 
a polarization closure relation like Eq. (fJTJ breaks down. (Note 
that the scalar analogue of this is simply a null scalar visibility, 
in which case scalar closures also break down.) 

In the wide-field or all-sky case (Eq. [T8l ). simple closures 
(whether matrix or scalar) no longer apply. However, the con- 
tribution of each discrete point source to the overall visibility 
is still subject to a closure relationship. It is perhaps useful to 
formulate this in differential terms. Consider a brightness distri- 
bution B (0) (/), and let this correspond to a set of observed visi- 
bilities \/fl. Adding a point source of flux Bi at position l\ gives 
us the brightness distribution: 

B (1) (Z) = B (0) (Z) + 5(/-/i)B 1 , 

where 5 is the Kronecker delta-function, with corresponding 
observed visibilities V^. From the RIME (and Eq.[18]in partic- 
ular) it then necessarily follows that the differential visibilities 



detailed treatment of this can be found in Thomp son et al.l (2001 , 
Sect. 6.2). The noise level imposes a hard sensitivity limit on any 
given observation, which has a few implications relevant to our 
purposes: 

- "Reaching the noise" has become the " gold standard" of cal- 
ibration (see Paper II. ISmirnovll201 1 ah . Many reductions are 
limited by calibration artifacts rather than the noise. 

- Corrections to the data (however one defines the term) can 
potentially distort the noise level across an observation in 
complicated ways, so due care must be taken. 

- Faint sources below the noise threshold can be effectively 
ignored. 

- Numerical approximations can be considered "good 
enough" once they get to within the no ise (assuming no 
systematic errors), but see Paper III (ISmirno 3 Hoiib, 
Sect. 2.6, Fig. 17) for a big caveat to this. 

The latter two considerations are what I refer to by "suf- 
ficiently faint" sources and "sufficiently close" approximations 
throughout this series of papers. 

5.2. Smearing and decoherence 

In Sect. ll.3l when going from Eq. (f5]l to (O, we assumed that the 
Jones matrix J p is constant over the time/frequency bin of the 
correlator. That this is, strictly speaking, never actually the case 
can be seen from the definition of the K- Jones term in Eq. ( TTOb . 
The vector u p is defined in units of wavelength, making K p vari- 
able in frequency. The Earth's rotation causes u p to rotate in our 
(fixed relative to the sky) coordinate frame, which also makes 
variable in time. To take this into account, the RIME (in any 
form) should be rewritten as an integration over a time/frequency 
interval. For example, the basic RIME of Eq. ®, when con- 
sidering the integration bin [fo, t\] x [vo, Vi], should be properly 
rewritten as: 



<V«> 



y„Jt, v) dvdt 



AfAv J J 

ti VI 

l — J J J p (t,v)BJ q 1 (t,v)dvdt, 



AfAv 



(22) 



'o v 



AV p? = Vlj - Vpg will then satisfy the matrix closure relation- 
ship of Eq. (ETI) . 

5. Limitations of the RIME formalism 

5.1. Noise 

The RIME as presented here and in the original papers is for- 
mulated for a noise-free measurement. In practice, each element « Small interferometers see very little atmospheric decoherence: if 
of the V pq matrix (i.e. each complex visibility) is accompanied z p « Z q (as is the case for closely located stations), then Z„Zf « 1, so 
by uncorrected Gaussian noise in the real and imaginary parts; a there is no net phase contribution to the integrand of Eq. i l22t . 



,(()) 



which becomes Eq. ([8]l at the limit of Af, Ay — > 0. Since 
J contains K, the complex phase of which is variable in fre- 
quency and time, the integration in Eq. d22l i always results in a 
net loss of amplitude in the measured <V P9 ). This mechanism is 
well-known in classical interferometry, and is commonly called 
time/bandwidth decorrelation or smearing. Note that a phase 
variation in any other Jones term in the signal chain will have 
a similar effect. The VLBI community knows of it in the guise 
of decoherence due to atmospheric phase variations; in RIME 
terms, atmospheric decoherence is just E q. d22l > applied to iono- 
spheric Z-Jones or tropospheric T-JonesQ I shall use the term 
decoherence for the general effect; and smearing for the specific 
case of decoherence caused by the K term. 

The mathematics of smearing ar e well -known for the 
scalar case, see e.g. Thompson et alJ d200lL Sect. 6.4) and 
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Bridle & Schwabl (Il999h . Smearing increases with baseline 
length (iipq) and distance from phase center (/, m). Since the 
noise amplitud e does not decre ase, smearing results in a decrease 
of sensitivity. lHamaker et al.l (1 19961) mention smearing in the 
context of the RIME. Since integration (and thus smearing) of 
a matrix equation is an element-by-element operation, treatment 
of smearing within the RIME formalism is a trivial extension of 
the scalar equations. 

For the general case of decoherence, a useful first-order ap- 
proximation can be obtained by assuming that At and Av are 
small enough that the amplitude of V pq remains constant, while 
the complex phase varies linearly. The relation 

j j x d X = sinc^e''^ 2 , 
o 

which is well-known from the case of smearing with a square 
taper, then gives us an approximate equation for decoherence, in 
terms of the phase changes in time (A*P) and frequency (AO): 

A«F AO 

<y pq ) a sine— sine— V M (f mid ,v mid ), (23) 

where r mid = (r + fi)/2, v mid = (v + vi)/2, 
A*P = arg \l pq {h , v mid ) - arg \f pq (t , v mid ), 
AO = argV / , 9 (f mid , vi) - arg V w (f mid , v ) 

Equation d23b is straightforward to apply numerically, and 
is independent of the particular form of J responsible for the 
decoherence. However, the assumption of linearity in phase over 
the time/frequency bin can only hold for the visibility of a single 
source. In fact, it is easy to see that any approximation treating 
decoherence as an amplitude-only effect can, in principle, only 
apply on a source-by-source basis - just consider the case of 
smearing, which varies significantly with distance from phase 
centre. In an equation like (IT6b . the approximation can be applied 
to each term in the sum individually, or at least to as many of the 
brightest sources as is practical. This approach wa s used for the 
calibration described in Paper III (Smir novll201 lbh . 

5.3. Interferometer-based errors 

The term interferometer-based errors refers to measurement er- 
rors that cannot be represented by per-antenna terms. These are 
also called closure errors, since they violate the closure relation- 
ships of Sect. |U When formulating Eq. ©, we assumed that the 
visibility matrix V pq output by the correlator is a perfect mea- 
surement of correlations between antenna voltages. Closure er- 
rors represent additional baseline-based effects. Assuming these 
are linear, and following No ordaml (119961) . we could rewrite the 
full-sky RIME of Eq. {[3} as: 

V Pq = M„*(J p X pq Jp+A pq , (24) 

where M pq is a 2 x 2 matrix of multiplicative interferometer 
errors, k pq is a 2 x 2 matrix of additive errors, and "*" represents 
element-by-element (rather than matrix) multiplication. 

Given a model for X pq , observed data \l pq , and self-calibrated 
per-antenna terms J p , it is trivial to estimate M and A us- 
ing Eq. d24l i. It is also trivial to see that the equation is ill- 
conditioned: any model X can be made to fit the data by choosing 
suitable values for M and A. We therefore need to assume some 



additional constraints, such as closure errors being fixed (or only 
slowly varying) in time and/or frequency. 

In practice, closure errors arise due to a combination of ef- 
fects: 



- The traditional "purely instrumental" cause is the use of ana- 
log components in the signal chain and parts of the corre- 
lator, which is typical of the previous generations of radio 
interferometers. New telescope designs tend to digitize the 
signal much closer to the receiver, and use all-digital corre- 
lators, presumably eliminating instrumental closure errors. 

- Smearing and decoherence (Sect. 15^21 is a baseline-based ef- 
fect, and will thus manifest itself as a closure errors, unless 
it is properly taken into account in the model for X pq . 

- In general, any source structure or flux not represented by 
the model X pq will also show up as a closure error. 



A solution for M and/or A will tend to subsume all these 
effects. This is dangerous, as it can actually at tenuate sources 
in the final images, as illustrated in Paper III (ISmirnovl 
Sect. 1.5). One must thus be very conservative with closure error 
solutions, lest they become just another "fudge factor" in the 
equations. 

5.4. A three-dimensional RIME? 

Recent work by ICarozzi & Woanl d2009l) highlights a limitation 
of the 2x2 Jones formalism. They point out that since we're 
measuring a 3D brightness distribution, the radiation from off- 
center sources is only approximately paraxial (equivalently, the 
EM waves are only approximately transverse). From this it fol- 
lows that a 2D description of the EMF based on a rank-2 vector 
(the e used above) is insufficient, and a rank-3 formalism is pro- 
posed. 

The main implication of the Carozzi-Woan result for the 
2x2 formalism is that the latter is still valid in general (at 
least for dual-receptor arrays), but the full-sky RIME of Eq. (IT7b 
must be augmented with an additional direction-dependent Jones 
term called the xy-projected transformation matrix, designated 
as T^ xy) (see their Eq. 34), which corresponds to a projection of 
the 3D brightness distribution onto the plane of the receptors. If 
all the receptors of the array are plane-parallel (Carozzi & Woan 
call this a plane-polarized interferometer), fw) j s a trivial DDE 
(in the sense of Eq. [20li. manifesting itself as a polarization aber- 
ration that increases with /, m (see their Fig. 2). For non-parallel 
receptors, T' (Av) should be a non-trivial DDE! 

Classical dish arrays are plane-polarized by design, but de- 
viate from this in practice due to pointing errors and other mis- 
alignments. The resulting effect is expected to be tiny given the 
typically narrow FoV of a dish, but it would be intriguing to 
see whether it can be detected in deliberately mispointed WSRT 
observations, given the extremely high dynamic range routinely 
achieved at the WSRT. On the other hand, an aperture array such 
as LOFAR should show a far more significant deviation from the 
plane-polarized case (due to the curvature of the Earth, as well as 
the all-sky FoV). With LOFAR's (as yet) relatively low dynamic 
range and extreme instrumental polarization, the effect may be 
challenging to detect at present. Further work on the subject is 
urgently required, given the polarization purity requirements of 
future telescopes (and in particular the SKA). 
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6. Alternative formulations 

6.1. Mueller vs. Jones formalism 

The original paper by Ha maker et al.l d!996l) formulated th e 
RIME in terms of 4 x 4 Mueller matrices dMuellerl [l948). 
This is mat hematically fully equivalent to the 2x2 form in- 
troduced by Hamaker (2000) i n the fourth paper, and has since 
been adopted by many authors dNoordarnll9 96; Thompson et al.l 
l200lt lBhatnagar et al. 2008; Rau et al J2009h . In my view, this is 
somewhat unfortunate, as the 2x2 formulation is both simpler 
and more elegant, and has far more intuitive appeal, especially 
for understanding calibration problems. For completeness, I will 
make an explicit link to the 4x4 form here. 

Instead of taking the matrix product of two voltage vectors 
v p and v q and getting a 2 x 2 visibility matrix, as in Eq. @, we 
can take the outer product of the two to get the visibility vector 



v pq- 



v„ = 2(v p »v*) = 2 



(VpaV* qa ) \ 
(VpaV* gb ) 
< V pbV qa ) 
(VpbV* qb ) 



Combining this with Eq. ©, we get 



ypq 



2(J p ®J H a )(e®e H ) = {J p ®J H a ) 



( i + Q x 

U + iV 
U -iV 

i-Q 



which then gives us the 4x4 form of Eq. ®: 

= (J P ®J*)SI =J P qSl 



(25) 



Here, Sf pq = J p ® J q is a 4 X 4 matrix describing the com- 
bined effect of the signal paths to antennas p and q, I is a column 
vector of the Stokes parameters (/, Q, U, V), and S is a conver- 
sion matrix that turns the Stokes vector into the brightness vector 
H 



( i + Q " 




f I \ 


U + iV 
U -iV 


= S 


Q 
u 


{ i-Q ) 




{ v ) 



The equivalent of the "onion" form of Eq. (O is then: 



"pi 



{Jpn®Jqn)-{Jp\ 



J ql^I — vqn--->U pq\SI 



(26) 



Likewise, the full-sky RIME of Eq. dT8l can be written in the 
4x4 form as: 



Q 



p'i 



If' 



S P q(l, m)SI(l, m)e 



■2m(u pq l+v pq m+w M (n-l)) 



dldm (27) 



This form of the RIME is particularly favoured when de- 
scribi ng imaging problems (Bhatnag ar et al.l 120081 : iRau et alj 
2009). It emphasizes that an interferometer performs a linear 
operation on the sky distribution 1(1, m), via the linear opera- 
tors @ pq , S P q(l, m), and the Fourier Transform T, while eliding 
the internal structure of Q and £. 



15 A Mueller matrix represents a linear operation on Stokes vectors, 
and so does not explicitly appear in these equations. For Eq. {25}, the 
equivalent Mueller matrix is S~ l tJ pq S. 



On the other hand, if we're interested in the underlying 
physics of signal propagation (as is often the case for calibration 
problems), then the 4x4 form of the RIME becomes extremely 
opaque. When considering any specific set of propagation effects 
(and its corresponding Jones chain), the outer product operation 
turns simple-l ooking 2x2 Jone s matrices into an intractable sea 
of indi ces; see [Bhatnaga r et aD d2008 . Eq. 4) and Hamak er et al] 
(1 19961 Appendix A) for typical examples. The 2x2 form pro- 
vides a more transparent description of calibration problems, and 
for this reason is also far better suited to teaching the RIME. 
An excellent ex ample of this transparency is given in Paper II 
(ISmirnovll201 lal Sect. 2.2.2), where I consider the effect of dif- 
ferential Faraday rotation. 

There are also potential computational issues raised by the 
4x4 formalism. A naive implementation of, e.g., Eq. (|26*T i incurs 
a series of 4 x 4 matrix multiplications for each interferometer 
and time/frequency point. Multiplication of two 4x4 matrices 
costs 112 floating-point operations (flops), and the outer product 
operation another 16. Therefore, each pair of Jones terms in the 
chain incurs 128 flops. The same equation in 2 x 2 form invokes 
12 floating-point operations (flops) per matrix multiplication, or 
24 per each pair of Jones terms. This is roughly 5 times fewer 
than the 4x4 case. 

Often, the true computational bottleneck lies elsewhere, i.e. 
in solving (for calibration) or gridding (for imaging), in which 
case these considerations are irrelevant. However, when running 
massive simulations (that is, using the RIME to predict visibil- 
ities), my profiling of MeqTrees has often shown matrix multi- 
plication to be the major consumer of CPU time. In this case, 
implementing calculations using the 2x2 form represents a sig- 
nificant optimization. 

6.2. Jones-specific formulations 

Formulations of the RIME such as Eqs. dT8l or ( fT6l > are en- 
tirely general and non-specific, in the sense that they allow for 
any combination of propagation effects to be inserted in place 
of the G and E terms. A specific formulation may be obtained 
by inserting a particular sequence of Jones matrices. The first 
RIME paper (Ham aker et al.l ll996) already sugg ested a specifi c 
Jones chain. This was further elaborated on bv lNoordaml dl996). 
and eventually implemented in AIPS ++, which subsequently be- 
came CAS A. T he Jones cha i n used by current versions of CAS A 
is described by My ers et aD d^OlO. Appendix E.l): 



J P = B p GpD p E p P p T p 



(28) 



The Jones matrices given here correspond to particular ef- 
fects in the signal chain, with specific parameterizations (e.g. B p 
is a frequency-variable b andpass, G„ is time-variable receiver 
gain, etc.) Other authors (Rau et al. 2009) suggest variations on 
this theme. 

Such a "Jones-specific" approach has considerable merit, 
in that it shows how different real-life propagation effects fit 
together, and gives us something specific to be thought about 
and implemented in software. It does have a few pitfalls which 
should be pointed out. 

The first pitfall of this approach is that it tends to place the 
trees firmly before the forest. A major virtue of the RIME is its 
elegance and simplicity, but this gets obscured as soon as elab- 
orate chains of Jones matrices are written out. I submit that the 
RIME's slow acceptance among astronomers at large is, in some 
part, due to the literature being full of equations similar to (1281 1. 
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That they are just specific cases of what is at core a very sim- 
ple and elegant equation is a point perhaps so obvious that some 
authors do not bother noting it, but it cannot be stressed enough! 

The second pitfall is that an equation like ( l28l ). when imple- 
mented in software, can be both too specific, and insufficiently 
flexible. (Note that the CASA implementation specifies both the 
time/frequency behaviour, and the form of the Jones terms, e.g. 
G is diagonal and variable in time, B is diagonal and variable in 
frequency, D has a specific "leakag e" form, etc.) Fo r instance, 
the calibration described in Paper III ( Smir novl201 lbl) cannot be 
done in CASA, despite using an ostensibly much simpler form 
of the RIME, because it includes a Jones term that was not antic- 
ipated in the CASA design. A second major virtue of the RIME 
is its ability to describe different propagation effects; this is im- 
mediately compromised if only a specific and limited set of these 
is chosen for implementation. 

A final pitfall of the Jones-specific view is that it tends to 
stereotype approaches to calibration. Equation ( l28l l is a huge 
improvement on the ad hoc approaches of older software sys- 
tems, but in the end it is just some model of an interferometer 
that happens to work well enough for "classically-designed" in- 
struments such as the VLA and WSRT, in their most common 
regimes. It is not universally true that polarization effects can be 
completely described by a direction-independent leakage matrix 
(D p ), or bandpass by B p - it just happens to be a practical first- 
order model, which completely breaks down for a new instru- 
ment such as LOFAR, where e.g. "leakage" is strongly direction- 
dependent. In fact, even WSRT resu lts can be impro ved by de- 
parting from this model, as Paper III dSmirnovl201 lbl) will show. 
We must therefore take care that our thinking about calibration 
does not fall into a rut marked out by a specific series of Jones 
terms. 



6.3. Circular vs. linear polarizations 

In Sect. Q] I mentioned th a t the RIME holds in any coordi- 
nate system. Hama ker et al.l (l996) briefly discussed coordinate 
transforms in this context, but a few additional words on the sub- 
ject are required. 

Field vectors e and Jones matrices J may be represented [by 
a particular set of complex values] in any coordinate system, by 
picking a pair of complex basis vectors in the plane orthogo- 
nal to the direction of propagation. I have used an orthonormal 
xy system until now. Another useful system is that of circular 
polarization coordinates rl, whose basis vectors (represented in 
the xy system) are e,- - -4=(l,-z) and e/ = -^=(1,/)- Any other 
pair of basis vectors may of course be used. In general, for any 
two coordinate systems S and T, there will be a corresponding 
2x2 conversion matrix T, such that ej = Tes, where cs and 
ej represent the same vector in the S and T coordinate systems. 
Likewise, the representation of the linear operator J transforms 
as Jj = TJsT 1 , while the brightness matrix B (or indeed any 
coherency matrix) transforms as Bj = TB&T H . 

Of particular importance is the matrix for conversion from 
linear to circularly polarized coordinates. This matrix is com- 
monly designated as H (being the mathematical equivalent of 
an electronic hybrid sometimes found in antenna receivers): 



the indices "O" and "+" where necessary to disambiguate be- 
tween circular and linear representations): 



H = 



V2 I 



H 1 = 
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V! 
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While EMF vectors and Jones matrices may be represented 
using an arbitrary basis, the receptor voltages we actually mea- 
sure are specific numbers. The voltage measurement process 
thus implies a preferred coordinate system, i.e. circular for cir- 
cular receptors, and linear for linear receptors. 

It is of course possible to convert measured data into a differ- 
ent coordinate frame after the fact. It is also perfectly possible, 
and indeed may be desirable, to mix coordinate systems within 
the RIME, by inserting appropriate coordinate conversion matri- 
ces into the Jones chain. A commonly encountered assumption 
is that a "VLA RIME" must be written down in circular coordi- 
nates and a "WSRT RIME" in linear, but this is by no means a 
fundamental requirement! We're free to express part of the signal 
propagation chain in one coordinate frame, then insert conver- 
sion matrices at the appropriate place in the equation to switch 
to a different coordinate frame. In the onion form of the RIME 
(Eq. [5), this corresponds to a change of coordinate systems as 
we go from one layer of the onion to another. For example: 



V 



pq 



GpH 



H H G» 



One reason to consider the use of mixed coordinate systems 
is the opportunity to optimize the representation of particular 
physical effects. As an example, a rotation in the xy frame (e.g. 
ionospheric Faraday rotation, or parallactic angle) is represented 
by a diagonal matrix in the rl frame. If the observed field has 
no intrinsic linear polarization, the B matrix is also diagonal. 
If a part of the RIME is known to contain diagonal matrices 
only, their product can be evaluated with significant computa- 
tional savings (compared to the full 2x2 matrix regime). On 
the other hand, if the instrument is using linear receptors, then 
receiver gains (G) should be expressed in the linear frame, lest 
calibrating them become extremely awkward. We should there- 
fore implement the RIME somewhat like the above equation, 
with the appropriate H matrices inserted as "late" in the chain 
as possible, so that only the minimum amount of computation is 
done for the full 2x2 case. This approach is not yet exploited 
by any existing softw are, but perhaps it should be . In particular, 
the MeqTrees system dNoordam & Smirnovl2010 ) automatically 
optimizes internal calculations when only diagonal matrices are 
in play, and would provide a suitable vehicle for exploring this 
technique. 

Note that the c onfiguration matrix C pro posed by 
iHamaker et al.l dl996l) . and further discussed by Noordam 
dl996h . plays a similar role, in that it converts from "antenna 
frame" to "voltage frame". Here I simply suggest a generaliza- 
tion of this line of thinking. The RIME allows for an arbitrary 
mix of coordinate frames, as long as the appropriate conversion 
matrices are inserted in their rightful placesPl 

7. Errors and controversies 

For all its elegance, even the simplest version of the RIME (e.g. 
as formulated in Sect. 11.31 ) contains two points of confusion and 



Consequently, the brightness matrix B, when represented in 
circular polarization coordinates, has the following form (I'll use 



16 Nor should we restrict our thinking to just the xy and rl frames. It 
could well be that the RIME of a future instrument will turn out to have 
a particularly elegant form in some other coordinate basis. 
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controversy. The first has to do with the sign of the iV term, and 
the second with the factors of 2 in the definition of V pC] and B. 



7.1. Sign of Stokes V 

The sign of Sto kes V has been a perennial source of confusion. 
The lIAUl d 1973b definition specifies that V is positive for right- 
hand circular polarization, but the literature is littered with pa- 
pers adopting the opposite convention. Fortunately, major soft- 
ware packages such as AIPS and MIRJAD follow the IAU def- 
inition (though this has not always been the case for their early 
versions). As for the iV term in the RIME, Papers I and II 
of the original series dHamaker et all Il996t ISault et all 1 19961) 
used the sign convention of Eq. (Q. In Paper III of the series, 
lHamaker~& Bregman (1996) then discussed the issue in detail, 
and showed that this convention is "correct" in the sense of fol- 
lowing from the IAU definitions for St okes V and stan dard coor- 
dinate systems. However, in Paper IV, Hamaker (2000) then used 
the opposite sign convention! In Paper V. iHamakerl d2006l) noted 
the inconsistency, yet persisted in using the opposite convention. 

For this series, I adopt the correct sign convention of the orig- 
inal RIME Papers I through III, as per Eq. d7). 

In practice, few radio astronomers concern themselves with 
circular polarisation, which is perhaps why the confusion has 
been allowed to fester. Unfortunately, this also means that in the 
rare cases when sign of V is important, it must be fastidiously 
checked each time! 



7.2. Factors of 2, or what is the unit response of an ideal 
interferometer? 

A far more insidious issue is the factor of 2 in Eqs. © and (0. 
This has been the subject of a long-standing controversy both in 
the literature and in software. The definition of Stokes / in terms 
of th e complex amplitudes of the electric field is quite unambigu- 
ous dThompson et al]l200ltlBorn & Wolil964l) . In particular: 



/ = <N 2 > + <k 



Q 



<KI 2 >. 



This implies that a unit source of I — \,Q — U — V — 
corresponds to complex amplitudes of (|e A | 2 ) = (|e v | 2 ) - 1/2. 
What is less clear is how to relate this to the outputs of a correla- 
tor. That is, given an ideal interferometer and a unit source at the 
phase centre, what visibility matrix M pq should we expect to see? 
(In other words, what is the gain factor of an ideal interferom- 
eter?) This is something for which no unambiguous definition 
exists. Historically, two conventions have emerged: 



with Hamake r et alJ (1 19961). and also originally adopted in 
the MeqTrees system dNoordam & Smirnovl 1201 01) . However, 
Convention- 1 is by far the more widespread, having been 
adopted by AIPS and other software systems, which has caused 
it to become entrenched in the minds of most radio astronomers. 

The first edition of what is effectively the 
main reference work of radi o interferometry, 
iThompson. Moran. & Swenson. Jr.1 (Q~986), had a factor of 
1/2 in the equations for interferometer response (Eq. 4.46), 
but omitted it in Table 4.47. (I conjecture that this table may 
in fact be the origin of Convention- 1 !) By the time of the 
second edition, Convention- 1 was already widespread, and the 
authors responded by dropping the factor of 1/2 after Eq. (4.29), 
noting that it was "omitted and considered to be sub sumed 
within the overall gain factor." (Thompson et ail [20011 see p. 
102). For better or for worse, this has irrevocably consecrated 
Convention- 1 as the one to follow. 

Ultimately, flux scales are tied to known calibrator sources, 
whose brightnesses are quite unambiguously defined in units of 
janskys. This means that in practice, the factor of 2 is indeed 
quietly subsumed into the gain calibration. Problems arise when 
data is moved between software packages that follow different 
conventions. For example, data calibrated with MeqTrees (for- 
merly using Convention- 1/2) is kept in a Measurement Set (MS), 
yet the only tool available for making images from an MS is 
the AIPS++/CASA imager (Convention- 1). This has often re- 
sulted in images with fluxes that were off by a factor of 2, so the 
MeqTrees project has recently switched to Convention- 1 . 

In this paper, I have taken the difficult decision of breaking 
with the original formulations, and recasting the RIME using 
Convention- 1 . There remains the question of where to inject the 
requisite factor of 2. I have decid ed to do it "on th e inside", by 
dropping the factor of 1/2 from the [Hamaker! d2000t) definition of 
the brightness matrix B (Eq.|7]i. The alternative was to add a fac- 
tor of 2 to the "outside" of the equation. The "inside" approach 
appears to have a number of practical advantages: 

- B becomes unity for a unit (1 Jy unpolarized) source. 

- The coherency of a point source at the phase centre 
(Sect. ll.7l l becomes equivalent to its brightness (and not one- 
half of its brightness). 

- In the "onion" form of the ME (Eq.|9]), each successive layer 
of the onion corresponds to measurable visibilities, without 
needing to carry an explicit factor of 2 around. 



8. Conclusions 



Convention-i/2. Unity correlations correspond to unity com- 
plex amplitudes, so a 1 Jy source produces correlations of 1/2 
each: 



<kJ 2 ) o \ _ l / 1 o 

<kvl 2 ) 2 01 



Convention-1 . Unity correlations correspond to unity Stokes /: 

' <kxl 2 > o \ / 1 o 1 



Vp9_2 l (\e x \ 2 ) 



1 



Convention- 1/2 is somewhat more pleasing to the purists, 
as it retains standard physical units for visibilities. This is 
the convention used throughout the RIME papers, beginning 



Since its original formulation by Hama ker et alJ (Q996), the 
Radio Interferometer Measurement Equation (RIME) has pro- 
vided the mathematical underpinnings for novel calibration 
methods and algorithms. Besides its explanatory power, the 
RIME formalism can be wonderfully simple and intuitive; this 
fact has become somewhat obscured by the many different di- 
rections that it has been taken in. Several authors have devel- 
oped approaches to the DDE problem based on the RIME, using 
different (but mathematically equivalent) versions of the formal- 
ism. This paper has attempted to reformulate these using one 
consistent 2x2 formalism, in preparation for follow-up papers (II 
and III) that will put it to work. Finally, a number of misunder- 
standings and controversies has inevitably accrued themselves 
to the RIME over the years. Some of these have been addressed 
here. It is hoped that this paper has gone some way to making 
the RIME simple again. 
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